Search CORE

28 research outputs found

The iCrawl Wizard -- Supporting Interactive Focused Crawl Specification

Author: Demidova Elena
Gossen Gerhard
Risse Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Collections of Web documents about specific topics are needed for many areas of current research. Focused crawling enables the creation of such collections on demand. Current focused crawlers require the user to manually specify starting points for the crawl (seed URLs). These are also used to describe the expected topic of the collection. The choice of seed URLs influences the quality of the resulting collection and requires a lot of expertise. In this demonstration we present the iCrawl Wizard, a tool that assists users in defining focused crawls efficiently and semi-automatically. Our tool uses major search engines and Social Media APIs as well as information extraction techniques to find seed URLs and a semantic description of the crawl intent. Using the iCrawl Wizard even non-expert users can create semantic specifications for focused crawlers interactively and efficiently.Comment: Published in the Proceedings of the European Conference on Information Retrieval (ECIR) 201

arXiv.org e-Print Archive

Crossref

Analyzing web archives through topic and event focused sub-collections

Author: Demidova Elena
Gossen Gerhard
Risse Thomas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Web archives capture the history of the Web and are therefore an important source to study how societal developments have been reflected on the Web. However, the large size of Web archives and their temporal nature pose many challenges to researchers interested in working with these collections. In this work, we describe the challenges of working with Web archives and propose the research methodology of extracting and studying sub-collections of the archive focused on specific topics and events. We discuss the opportunities and challenges of this approach and suggest a framework for creating sub-collections

Southampton (e-Prints Soton)

Should I Care about Your Opinion? : Detection of Opinion Interestingness and Dynamics in Social Media

Author: Fisichella Marco
Funk Adam
Gossen Gerhard
Maynard Diana
Publication venue: Basel : MDPI AG
Publication date: 01/01/2014
Field of study

In this paper, we describe a set of reusable text processing components for extracting opinionated information from social media, rating it for interestingness, and for detecting opinion events. We have developed applications in GATE to extract named entities, terms and events and to detect opinions about them, which are then used as the starting point for opinion event detection. The opinions are then aggregated over larger sections of text, to give some overall sentiment about topics and documents, and also some degree of information about interestingness based on opinion diversity. We go beyond traditional opinion mining techniques in a number of ways: by focusing on specific opinion-target extraction related to key terms and events, by examining and dealing with a number of specific linguistic phenomena, by analysing and visualising opinion dynamics over time, and by aggregating the opinions in different ways for a more flexible view of the information contained in the documents.EU/27023

Multidisciplinary Digital Publishing Institute

CiteSeerX

Directory of Open Access Journals

Institutionelles Repositorium der Leibniz Universität Hannover

Evaluation of Methods and Techniques for Language Based Sentiment Analysis for DAX 30 Stock Exchange A First Concept of a â€œLUGOâ€ Sentiment Indicator

Author: Gossen Gerhard
Lugmayr Artur
Publication venue: International Ambient Media Assocation (iAMEA)
Publication date: 01/08/2013
Field of study

Social media companies are famous for creating communities, or their companies for IPOs. However, social media are as well utilized in stock exchange trading and for product promotion of securities of financial investment companies. Especially stock exchange trading is many times based on sentiment, thus fast spreading rumors and news. Within the scope of this publication, we aim at an evaluation of potential methods and techniques for language based sentiment analysis for the purpose of stock exchange trading. Within the scope of this publication we evaluate a possible technique to obtain a technical indicator based on social media, which should support investment decisions. We present a basic experimental setup and try to describe the LUGO Sentiment Indicator as possible tool for supporting investment decisions based on a social media sentiment analysis

International SERIES on Information Systems and Management in Creative eMedia (CreMedia)

Mechanistic DFT studies – helicate-type complexes with different alcylic spacers

Author: Gerhard Raabe
M Albrecht
M Albrecht
M Albrecht
M Albrecht
Markus Albrecht
T Yanai
Verena Gossen
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publikationsserver der RWTH Aachen University

Semantic URL Analytics to Support Efficient Annotation of Large Scale Web Archives

Author: Cardoso Jorge
Demidova Elena
Gossen Gerhard
Guerra Francesco
Holzmann Helge
Houben Geert-Jan
Pinto Alexandre Miguel
Risse Thomas
Souza Tarcisio
Szymanski Julian
Velegrakis Yannis
Publication venue: Berlin ; Heidelberg : Springer
Publication date: 01/01/2016
Field of study

Long-term Web archives comprise Web documents gathered over longer time periods and can easily reach hundreds of terabytes in size. Semantic annotations such as named entities can facilitate intelligent access to the Web archive data. However, the annotation of the entire archive content on this scale is often infeasible. The most efficient way to access the documents within Web archives is provided through their URLs, which are typically stored in dedicated index files. The URLs of the archived Web documents can contain semantic information and can offer an efficient way to obtain initial semantic annotations for the archived documents. In this paper, we analyse the applicability of semantic analysis techniques such as named entity extraction to the URLs in a Web archive. We evaluate the precision of the named entity extraction from the URLs in the Popular German Web dataset and analyse the proportion of the archived URLs from 1,444 popular domains in the time interval from 2000 to 2012 to which these techniques are applicable. Our results demonstrate that named entity recognition can be successfully applied to a large number of URLs in our Web archive and provide a good starting point to efficiently annotate large scale collections of Web documents

Institutionelles Repositorium der Leibniz Universität Hannover

Mouse models of breast cancer metastasis

Author: A Muller
A Orimo
AB Tuck
AF Chambers
AF Chambers
AG Ridgeway
AJ Minn
Anna Fantozzi
B Fisher
B Sauer
B Weigelt
BD Cuevas
BS Wiseman
C Gravekamp
C Khanna
C Kuperwasser
C Schulze-Garg
C Xue
C Xue
CA Schoenenberger
CH Liu
CJ Aslakson
CT Guy
CT Guy
D Dankort
D Gallahan
D Hanahan
DD Pravtcheva
DF Pierce Jr
DH Jones
DJ Schoeffner
DM Bortner
E Forrester
E Sinn
EJ Gunther
EP Sandgren
EY Lin
EY Lin
F Ahmed
G Zhang
Gerhard Christofori
GN Naumov
H Adwan
H Kwan
H Lee
H Saito
I Georgakoudi
IG Maroulakou
IJ Fidler
J Condeelis
J Hurst
J Wyckoff
J Yang
JA Schroeder
JB Kim
JE Green
JE Maglione
JI Lopez
JJ Yin
JL Fisher
JS Ross
K Alitalo
K Almholt
K Lipnik
KU Wagner
KV Desai
L Hennighausen
L Hennighausen
LL Nielsen
LS Kim
M Gossen
M Jeffers
M Kolonin
M Skobe
MI Gallego
MM Mattila
MN Pollak
MN Torrero
MP Davies
N Cabioglu
N Dumont
N Ferrara
N Roberts
N Weidner
NE Hynes
P Dubey
PD Ottewell
PM Siegel
PM Siegel
RD Cardiff
RD Palmiter
RM Hoffman
RS Muraoka
RS Muraoka-Cook
S Ali
S Kawamoto
S Morony
SA Khan
SA Vantyghem
SC Lin
SE Moody
SE Moody
SK Lyons
T Yoneda
TA Stewart
TH Bugge
TJ Sweeney
U Cavallaro
V Gouon-Evans
WJ Muller
Y Kang
Y Li
YA Yang
Z Liang
Z Liang
Publication venue: BioMed Central
Publication date: 26/07/2006
Field of study

Metastatic spread of cancer cells is the main cause of death of breast cancer patients, and elucidation of the molecular mechanisms underlying this process is a major focus in cancer research. The identification of appropriate therapeutic targets and proof-of-concept experimentation involves an increasing number of experimental mouse models, including spontaneous and chemically induced carcinogenesis, tumor transplantation, and transgenic and/or knockout mice. Here we give a progress report on how mouse models have contributed to our understanding of the molecular processes underlying breast cancer metastasis and on how such experimentation can open new avenues to the development of innovative cancer therapy

Crossref

PubMed Central

Extracting event-centric document collections from large-scale web archives

Author: Demidova Elena
Gossen Gerhard
Risse Thomas
Publication venue
Publication date: 01/01/2017
Field of study

Web archives created by the Internet Archive (IA) (https://archive.org), national libraries and other archiving services contain large amounts of information collected for a time period of over twenty years. These archives constitute a valuable source for research in many disciplines, including the digital humanities and the historical sciences by offering a unique possibility to look into past events and their representation on the Web. Most Web archive services aim to capture the entire Web (IA) or national top-level domains and are therefore broad in their scope, diverse regarding the topics they contain and the time intervals they cover. Due to the large size and the broad scope it is difficult for interested researchers to locate relevant information in the archives as search facilities are very limited. Many users are more interested in studying smaller and topically coherent event-centric collections of documents contained in a Web archive [1,2]. Such collections can reflect specific events such as elections, or natural disasters, e.g. the Fukushima nuclear disaster (2011) or the German federal elections

arXiv.org e-Print Archive

Crossref

Hochschulschriftenserver - Universität Frankfurt am Main